Arborest – a Growing Treebank of Estonian

نویسندگان

  • Eckhard Bick
  • Heli Uibo
  • Kaili Müürisep
چکیده

Treebank creation is a very labor-consuming task, especially if the applications intended include machine learning, gold standard parser evaluation or teaching, since only a manually checked syntactically annotated corpus can provide optimal support for these purposes. There are, however, possibilities to make the annotation process (partly) automatic, saving (manual) annotation time and/or allowing the creation of larger corpora. Whenever possible, existing resources – both corpora and grammars – should be reused.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arborest – a VISL-Style Treebank Derived from an Estonian Constraint Grammar Corpus

Treebank creation is a very labor-consuming task, especially if the applications intended include machine learning, gold standard parser evaluation or teaching, since only a manually checked syntactically annotated corpus can provide optimal support for these purposes. There are, however, possibilities to make the annotation process (partly) automatic, saving (manual) annotation time and/or all...

متن کامل

Estonian Copular and Existential Constructions as an UD Annotation Problem

This article is about annotating clauses with nonverbal predication in version 2 of Estonian UD treebank. Three possible annotation schemas are discussed, among which separating existential clauses from copular clauses would be theoretically most sound but would need too much manual labor and could possibly yield inconcistent annotation. Therefore, a solution has been adapted which separates ex...

متن کامل

Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies

This paper presents the first version of Estonian Universal Dependencies Treebank which has been semi-automatically acquired from Estonian Dependency Treebank and comprises ca 400,000 words (ca 30,000 sentences) representing the genres of fiction, newspapers and scientific writing. Article analyses the differences between two annotation schemes and the conversion procedure to Universal Dependen...

متن کامل

Syntactically annotated corpora of Estonian

Syntactically annotated corpora are needed 1) to train and test parsers and various language technological products grammar checkers, information retrievers and extractors, machine translators etc; 2) to check the agreement of existing linguistic theories with the real language usage. The corpora can be annotated on different levels of depth. In shallow syntactically annotated corpora a syntact...

متن کامل

Study of the effect of Estonian and aqueous extract of Persian walnut tree leaf (Juglans regia) on growth indicators in western white shrimp farmed (Litopenaeus vannamei)

The aim of this study was to investigate the effects of Estonian and aqueous extracts of Persian walnut leaves on the performance of growth indices in western white shrimp (Litopenaeus vannamei). Materials and methods included 6 treatments of shrimp with different concentrations of 100, 200 and 300 mg/kg aqueous and Estonian extracts of Persian walnut leaves in the diet and 2 negative control t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004